This analysis looks at data on skier chairlift rides. A few sample skier “days” for multiple seasons (05-06 to 14-15) are used to draw general conclusions about the capability of modeling in enhancing prediction of skier visits and chair lift rides.
The analysis provides potential benefit to three groups: 1. Area-management. Predictive models provide skier load anlaysis allowing better management of hill-side resources. Details of chair loading can be correlated to date, day, and time, weather conditions, snow conditions, etc. 2. Management and Users: Models can detect fraudulent pass use, protecting purchasers and area managment from fraud. 3. Area-users: Analytics can provide insights to skiers on their performance and possibly enable marketing and other offers to tailor and optimize their on-hill and off-hill experience.
The purpose is to explore the feasiblity and reliability with which skier visits can be predicted and analyzed for the benefit of the hill operator.
A secondary purpose is to provide advanced analystics to individual skiers and explore whether “fraud” (that is the use of a ski pass by other than the owner) can be detected from this data.
Key findings include:
1. Skier behavior is remarkably reproducile on a day to day basis.
All ski runs data for a sample of individual skiers are documented. Note that this sample is much smaller than the total data available to the ski lift operator (who in principle has it all).
Just to give a flavor for the data, here is a sample of ski runs for an individual skier. Note the sample spans multiple years.
| time | chair | vertical_feet | vertical_meters | year | month | day | time_of_day | date | pass |
|---|---|---|---|---|---|---|---|---|---|
| 2014-04-11 14:20:04 | Summit | 1709 | 520 | 2014 | 4 | 11 | 14:20:04 | 2014-04-11 | 7b410007 |
| 2014-04-11 13:16:36 | Summit | 1709 | 520 | 2014 | 4 | 11 | 13:16:36 | 2014-04-11 | 7b410007 |
| 2014-04-11 12:09:55 | Pine Marten | 1367 | 416 | 2014 | 4 | 11 | 12:09:55 | 2014-04-11 | 7b410007 |
| 2013-04-12 12:08:23 | Summit | 1709 | 520 | 2013 | 4 | 12 | 12:08:23 | 2013-04-12 | 7b410007 |
| 2013-04-09 11:45:30 | Northwest | 2377 | 724 | 2013 | 4 | 9 | 11:45:30 | 2013-04-09 | 7b410007 |
| 2013-04-09 09:19:08 | Pine Marten | 1367 | 416 | 2013 | 4 | 9 | 09:19:08 | 2013-04-09 | 7b410007 |
| 2012-04-24 09:19:28 | Pine Marten | 1367 | 416 | 2012 | 4 | 24 | 09:19:28 | 2012-04-24 | 7b410007 |
| 2011-04-10 13:04:16 | Pine Marten | 1367 | 416 | 2011 | 4 | 10 | 13:04:16 | 2011-04-10 | 7b410007 |
| 2010-04-12 13:43:37 | Pine Marten | 1367 | 416 | 2010 | 4 | 12 | 13:43:37 | 2010-04-12 | 7b410007 |
Each skier makes a few dozen to several hundred runs per season. A way to meaningfully summarize the data is to first read in all the data and looks at total number of days skied.
Here’s a representation of the skiers per day for the 14-15 Season.
print(head(skier_day_count_14_15))
## all_ski_days Freq week yday season_week wday year season
## 1799 2015-03-01 80 9 59 18 Sunday 2015 14-15
## 1723 2014-12-13 51 50 346 7 Saturday 2014 14-15
## 1757 2015-01-16 49 3 15 11 Friday 2015 14-15
## 1821 2015-03-24 48 12 82 21 Tuesday 2015 14-15
## 1797 2015-02-27 43 9 57 17 Friday 2015 14-15
## 1730 2014-12-20 42 51 353 8 Saturday 2014 14-15
## normal_Freq skier_cum
## 1799 1.0000 1.0000
## 1723 0.6375 1.6375
## 1757 0.6125 2.2500
## 1821 0.6000 2.8500
## 1797 0.5375 3.3875
## 1730 0.5250 3.9125
Here’s a representation of the skiers per day for the 12-13 Season.
#print(head(skier_day_count_12_13))
## all_ski_days Freq week yday season_week wday year season
## 1419 2013-02-24 79 8 54 17 Sunday 2013 12-13
## 1418 2013-02-23 71 8 53 17 Saturday 2013 12-13
## 1461 2013-04-07 65 14 96 23 Sunday 2013 12-13
## 1370 2013-01-06 63 1 5 10 Sunday 2013 12-13
## 1336 2012-12-02 60 49 336 5 Sunday 2012 12-13
## 1391 2013-01-27 60 4 26 13 Sunday 2013 12-13
## normal_Freq skier_cum
## 1419 1.0000000 1.000000
## 1418 0.8987342 1.898734
## 1461 0.8227848 2.721519
## 1370 0.7974684 3.518987
## 1336 0.7594937 4.278481
## 1391 0.7594937 5.037975
Next, let’s explore looking at daily total runs and vertical feet.
Here is a sample of the data for total runs per day per skier.
| runs_per_day | date | pass | week | season_week | wday | mon | year | weekend | holiday | bizday | season |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 19.00 | 2013-02-24 | dc6e8366 | 8 | 13.00 | 0 | 1 | 2013 | TRUE | TRUE | FALSE | 12-13 |
| 12.00 | 2014-02-23 | 22f3ebd1 | 8 | 13.00 | 0 | 1 | 2014 | TRUE | TRUE | FALSE | 13-14 |
| 19.00 | 2013-02-03 | 22f3ebd1 | 5 | 10.00 | 0 | 1 | 2013 | TRUE | TRUE | FALSE | 12-13 |
| 14.00 | 2009-11-26 | 22f3ebd1 | 48 | 52.00 | 4 | 10 | 2009 | FALSE | TRUE | FALSE | 09-10 |
| 8.00 | 2015-04-09 | cf87f389 | 15 | 19.00 | 4 | 3 | 2015 | FALSE | FALSE | TRUE | 14-15 |
| 6.00 | 2012-12-30 | 9e9db4ff | 53 | 5.00 | 0 | 11 | 2012 | TRUE | TRUE | FALSE | 12-13 |
| 5.00 | 2013-03-16 | 4d129450 | 11 | 16.00 | 6 | 2 | 2013 | TRUE | TRUE | FALSE | 12-13 |
| 12.00 | 2011-02-06 | 4d129450 | 6 | 10.00 | 0 | 1 | 2011 | TRUE | TRUE | FALSE | 10-11 |
| 6.00 | 2009-11-27 | 4d129450 | 48 | 52.00 | 5 | 10 | 2009 | FALSE | FALSE | TRUE | 09-10 |
From random sample of data from 15 out of the 348 skiers a graph below show that runs per skier are fairly consistent from season to season.
We can also look at the number of vertical feet skied per day per skier. Again, most skiers seem to demonstrate fairly consistent behavior.
| total_vertical | date | pass | week | season_week | wday | year | weekend | holiday | bizday | season |
|---|---|---|---|---|---|---|---|---|---|---|
| 34569 | 2013-02-24 | dc6e8366 | 8.00 | 13.00 | 0 | 2013 | TRUE | TRUE | FALSE | 12-13 |
| 17815 | 2014-02-23 | 22f3ebd1 | 8.00 | 13.00 | 0 | 2014 | TRUE | TRUE | FALSE | 13-14 |
| 26122 | 2013-02-03 | 22f3ebd1 | 5.00 | 10.00 | 0 | 2013 | TRUE | TRUE | FALSE | 12-13 |
| 18589 | 2009-11-26 | 22f3ebd1 | 48.00 | 52.00 | 4 | 2009 | FALSE | TRUE | FALSE | 09-10 |
| 11342 | 2015-04-09 | cf87f389 | 15.00 | 19.00 | 4 | 2015 | FALSE | FALSE | TRUE | 14-15 |
| 11661 | 2012-12-30 | 9e9db4ff | 53.00 | 5.00 | 0 | 2012 | TRUE | TRUE | FALSE | 12-13 |
| 5097 | 2013-03-16 | 4d129450 | 11.00 | 16.00 | 6 | 2013 | TRUE | TRUE | FALSE | 12-13 |
| 18837 | 2011-02-06 | 4d129450 | 6.00 | 10.00 | 0 | 2011 | TRUE | TRUE | FALSE | 10-11 |
| 5407 | 2009-11-27 | 4d129450 | 48.00 | 52.00 | 5 | 2009 | FALSE | FALSE | TRUE | 09-10 |
A random sample of 15 of the current 348 skiers is used. Below are the first few rows of the data. (NOTE - PASS NUMBERS HAVE BEEN HASHED TO PRESERVE ANONYMITY)
There appears to be pretty good distinction between individual skiers. We can also check the data from season to season as below. Again, it appears to provide a reasonably consistent picture.
We need to add some more data to analysze skier speed. While the pass data tracks the vertical rise of the chairlift ride, the descent, which is the actual skiing part, is to the base of the next chairlift. Base elevations are not supplied with the downloaded data so need to be added. We can get some more data from the Master Development Plan and Google Maps
Ski run speed is affected by lots of factors, including the path chosen and terrain. To examine this we’ll look at the distrubution of runs split by chair lift and destination.
As might be expected, there’s a lot of variation in the speed. While some runs, such as Pine Marten to Summit, show almost uniform variation, the steeper runs returning to the same chairlift show the most data and most consisent variation.
Here’a a detailed look at a some chairlifts.
For the lift Pine Marten there are 4033 total runs for 15 skiers.
For the lift Summit there are 904 total runs for 12 skiers.
For the lift Northwest there are 1105 total runs for 15 skiers.
Another preference individual skiers have is for specific chairlifts. Let’s see if this can contribute to behavioral classification. To do this we’ll adopt a simple model looking at charilift patterns.
## Using chair_list as id variables
As can be seen there are strong variations in the choice of chairlifts between skiers.
set.seed(8675309+3)
## just get one pass
pass_sample<-"MBA6360970.csv"#sample(valid_pass, 1)
##calculate reference for rms
skier_probs<-NULL
chair_list<-rownames(look_up)
chair_list<-gsub(" ", "_", chair_list)
skier_probs<-as.data.frame(chair_list)
##get overall skier probs
pass<-pass_sample
## read in the pass data
ski_runs<-NULL
ski_runs<-read.csv(paste0(Directory, pass))
##fix Pine_Marten
ski_runs$chair<-gsub(" ", "_", ski_runs$chair)
## suppress pass_id column
ski_runs$pass_id<-NULL
## hash the pass identity
pass_s<-digest(pass, algo="xxhash32")
ski_runs$pass<-pass_s
prob<-chair_probs(ski_runs$chair, chair_list)
colnames(prob)<-pass_s
##handle exception on first pass
if (!is.null(skier_probs)) {skier_probs <- cbind(skier_probs, prob)} else {skier_probs<-prob}
## get dates
dates_list<-unique(ski_runs$date)
day_probs<-NULL
for (i_date in dates_list){
#i_date<-dates_list[1]
day_runs<-ski_runs[ski_runs$date==i_date,]
prob<-chair_probs(day_runs$chair, chair_list)
colnames(prob)<-i_date
##handle exception on first pass
if (!is.null(day_probs)) {day_probs <- cbind(day_probs, prob)} else {day_probs<-prob}
}
t<-day_probs
t<-t[,1:ncol(t)]-skier_probs[,2]
colMin <- function(data) sapply(data, min, na.rm = TRUE)
w<-colMin(t)
t<-sqrt(t*t)
colMax <- function(data) sapply(data, max, na.rm = TRUE)
y<-colMax(t)
z<-colSums(t)
#convert to long format
u<-melt(t)
## No id variables; using all as measure variables
v<-melt(y)
z<-melt(w)
u$date<-rownames(u)
v$date<-rownames(v)
z$date<-rownames(z)
day_probs$chair<-rownames(day_probs)
b<-melt(day_probs, id="chair")
colnames(b)<-c("chair", "date", "prob")
p<-ggplot(b, aes(x=date, y= prob, color = factor(chair)))
p <- p + geom_point()
print(p)
q<- ggplot(u, aes(x=date, y = value))
q<- q + geom_point(size = 3, color = "salmon")
print(q)
q<- ggplot(v, aes(x=date, y = value))
q<- q + geom_point(size = 3, color = "darkgreen")
print(q)
q<- ggplot(z, aes(x=date, y = abs(value)))
q<- q + geom_point(size = 2, color = "darkblue")
print(q)
# this is what ski_runs will look like
# time chair vertical_feet vertical_meters year month day time_of_day date pass
# 1 2006-03-11 11:14:33 Pine_Marten 1367 416 2006 3 11 11:14:33 2006-03-11 a016b213
# 2 2006-03-11 11:27:59 Pine_Marten 1367 416 2006 3 11 11:27:59 2006-03-11 a016b213
# 3 2006-03-11 11:43:00 Pine_Marten 1367 416 2006 3 11 11:43:00 2006-03-11 a016b213
# 4 2006-03-11 11:43:00 Pine_Marten 1367 416 2006 3 11 11:43:00 2006-03-11 a016b213
The approach will be to take the averaged probabilities and use the MLE’s for a single skier to calculate the expected probability for each ski day for a large sample of skiers.
Here is a sample of skier probabilities (expressed as logarithms) for a specific passholder.
| chair_list | 35292c89 |
|---|---|
| Sunrise | 0.16 |
| Rainbow | 0.06 |
| Skyliner | 0.12 |
| Summit | 0.13 |
| Sunshine | 0.01 |
| Pine_Marten | 0.21 |
| Red | 0.00 |
| Outback | 0.12 |
| Northwest | 0.19 |
The table above shows the probabilities for 35292c89.
Here are some sample probabilities
| date | pass | day_prob |
|---|---|---|
| 2005-12-09 | 35292c89 | 0.18 |
| 2005-12-10 | 35292c89 | 0.16 |
| 2005-12-28 | 35292c89 | 0.14 |
| 2005-12-29 | 35292c89 | 0.14 |
| 2005-12-30 | 35292c89 | 0.15 |
| 2006-01-01 | 35292c89 | 0.16 |
| 2006-01-28 | 35292c89 | 0.15 |
| 2006-01-29 | 35292c89 | 0.17 |
| 2006-02-11 | 35292c89 | 0.12 |
| 2006-02-20 | 35292c89 | 0.14 |
## date pass day_prob color_code
## 1 2005-12-09 35292c89 0.1784988 TRUE
## 2 2005-12-10 35292c89 0.1559082 TRUE
## 3 2005-12-28 35292c89 0.1405858 TRUE
## 4 2005-12-29 35292c89 0.1351913 TRUE
## 5 2005-12-30 35292c89 0.1538793 TRUE
## 6 2006-01-01 35292c89 0.1612128 TRUE